159 research outputs found

    SENSEVAL, una aproximació computacional al significat

    Get PDF

    Identity, non-identity, and near-identity: Addressing the complexity of coreference

    Get PDF
    This article examines the mainstream categorical definition of coreference as "identity of reference." It argues that coreference is best handled when identity is treated as a continuum, ranging from full identity to non-identity, with room for near-identity relations to explain currently problematic cases. This middle ground is needed to account for those linguistic expressions in real text that stand in relations that are neither full coreference nor non-coreference, a situation that has led to contradictory treatment of cases in previous coreference annotation efforts. We discuss key issues for coreference such as conceptual categorization, individuation, criteria of identity, and the discourse model construct. We redefine coreference as a scalar relation between two (or more) linguistic expressions that refer to discourse entities considered to be at the same granularity level relevant to the linguistic and pragmatic context. We view coreference relations in terms of mental space theory and discuss a large number of real life examples that show near-identity at different degrees

    A classificación of Spanish pyschological verbs

    Get PDF
    The present paper is presented within the context of the research currently being carried out within the field of . Computational Lexicography at the University of Barcelona Linguistics Department - in collaboration with the University of Maryland Computer Science Department and provisionally called PIRAPIDES. The research deals with the study of verbal diathesis, subcategorization frames, S-grids and the definition of a typology of S-roles apt for the description of the argumental structure

    Comparing distributional semantic models for identifying groups of semantically related words

    Get PDF
    Distributional Semantic Models (DSM) are growing in popularity in Computational Linguistics. DSM use corpora of language use to automatically induce formal representations of word meaning. This article focuses on one of the applications of DSM: identifying groups of semantically related words. We compare two models for obtaining formal representations: a well known approach (CLUTO) and a more recently introduced one (Word2Vec). We compare the two models with respect to the PoS coherence and the semantic relatedness of the words within the obtained groups. We also proposed a way to improve the results obtained by Word2Vec through corpus preprocessing. The results show that: a) CLUTO outperformsWord2Vec in both criteria for corpora of medium size; b) The preprocessing largely improves the results for Word2Vec with respect to both criteria

    Tecnologies de la llengua i les seves aplicacions

    Get PDF
    [Resumo] A investigación en Lingüística Computacional e Procesamento da Lenguaje Natural deu lugar estes últimos anos ás denominadas Tecnoloxías da Linguaxe, cuxo obxectivo principal é o desenvolvemento de sistemas informáticos capaces de recoñeceren, comprenderen e xeraren linguaxe humana en todas as súas formas. Con esta finalidade, desenvolveuse unha serie de aplicacións, como a Tradución Automática, a Extracción e Recuperación da Información, a Clasificación de Documentos etc., que procesan a información para facilitaren o acceso, organización e transmisión do coñecemento que xera a chamada Sociedade da Información en que vivimos. Como noutras disciplinas científicas, na área da Lingüística Computacional e do Procesamento da Linguaxe Natural pasouse dunha etapa inicial centrada na investigación básica de carácter experimental a outra en que se interaxe máis coa sociedade e, por tanto, máis interesada na creación de produtos e aplicacións que resolvan problemas reais. Isto significa desenvolver sistemas e recursos capaces de analizaren a linguaxe sen restricións, isto é, que ofrezan unha ampla cobertura lingüística. Neste artigo preséntase de xeito introdutorio os recursos (lingüísticos) e as aplicacións máis características que se desenvolven actualmente no marco das Tecnoloxías da Linguaxe. En concreto, salientaremos dos recursos necesarios os analizadores e desambiguadores morfolóxicos e sintácticos, os lexicóns computacionais e os corpus lingüísticos, nomeadamente os etiquetados. Canto ás aplicacións, centrarémonos básicamente na Recuperación e Extracción da Información e na Tradución Automática.[Abstract] In the last years, research on Computational Linguistics and Natural Language Processing has led to Language Technologies, whose main goal is to develop computer systems capable to recognize, understand and generate human languages in all their forms. For this purpose, several applications have been developed, such as Machine Translation, Information Retrieval and Information Extraction or Document Classification. These applications process the language in order to ease access to knowledge, its organization or its transmission, activities needed by our Information Society. As in other disciplines, Computational Linguistics and Natural Language Processing have gone from a first period of basic, experimental research to another in which new products and real applications have to be created, in order to solve interaction problems. This means that we need to develop systems and resources capable to deal with unrestricted language, that is, broad-coverage systems and resources. This paper presents an introduction to linguistics resources as well as the main applications being developed nowadays in the Language Technologies framework. More concretely, it emphasizes morphological analyzers, taggers, syntactic parsers, computational lexicons and linguistic annotated corpora. As for applications, stress is laid on Information Retrieval, Information Extraction and Machine Translation

    WRPA: A system for relational paraphrase acquisition from Wikipedia

    Get PDF
    In this paper we present WRPA, a system for Relational Paraphrase Acquisition from Wikipedia. WRPA extracts paraphrasing patterns that hold a particular relation between two entities taking advantage of Wikipedia structure. What is new in this system is that Wikipedia's exploitation goes beyond infoboxes, reaching itemized information embedded in Wikipedia pages. WRPA is language independent, assuming that there exists Wikipedia and shallow linguistic tools for that particular language, and also independent of the relation addressed

    Paraphrase concept and typology. A linguistically based and computationally oriented approach

    Get PDF
    In this paper, we present a critical analysis of the state of the art in the definition and typologies of paraphrasing. This analysis shows that there exists no characterization of paraphrasing that is comprehensive, linguistically based and computationally tractable at the same time. The following sets out to define and delimit the concept on the basis of the propositional content. We present a general, inclusive and computationally oriented typology of the linguistic mechanisms that give rise to form variations between paraphrase pairs

    Text as Scene: Discourse Deixis and Bridging Relations

    Get PDF
    This paper presents a new framework, "text as scene", which lays the foundations for the annotation of two coreferential links: discourse deixis and bridging relations. The incorporation of what we call textual and contextual scenes provides more flexible annotation guidelines, broad type categories being clearly differentiated. Such a framework that is capable of dealing with discourse deixis and bridging relations from a common perspective aims at improving the poor reliability scores obtained by previous annotation schemes, which fail to capture the vague references inherent in both these links. The guidelines presented here complete the annotation scheme designed to enrich the Spanish CESS-ECE corpus with coreference information, thus building the CESS-Ancora corpus

    Intensive use of lexicon and Corpus for WSD

    Get PDF
    [spa] El artículo trata sobre el uso de información lingüística en la Desambiguación Semántica Automática (DSA). Proponemos un método de DSA basado en conocimiento y no supervisado, que requiere sólo un corpus amplio, previamente etiquetado a nivel morfológico, y muy poco conocimiento gramatical. El proceso de DSA se realiza a través de los patrones sintácticos en los que una ocurrencia ambigua aparece, en base a la hipótesis de 'almost one sense per syntactic pattern'. Esta integración nos permite extraer información paradigmática y sintagmática del corpus relacionada con la ocurrencia ambigua. Usamos variantes de la información de EuroWordNet asociada a los sentidos y dos algoritmos de DSA. Presentamos los resultados obtenidos en la aplicación del método sobre la tarea Spanish lexical sample de Senseval-2. La metodología es fácilmente transferible a otras lenguas. [eng] The paper addresses the issue of how to use linguistic information in Word Sense Disambiguation (WSD). We introduce a knowledge-driven and unsupervised WSD method that requires only a large corpus previously tagged with POS and very little grammatical knowledge. The WSD process is performed taking into account the syntactic patterns in which the ambiguous occurrence appears, relaying in the hypothesis of "almost one sense per syntactic pattern". This integration allows us to obtain, from corpora, paradigmatic and syntagmatic information related to the ambiguous occurrence. We also use variants of EWN information for word senses and different WSD algorithms. We report the results obtained when applying the method on the Spanish lexical sample task in Senseval-2. This methodology is easily transportable to other languages
    corecore